Kaunas
As NATO-Russia tensions rise, Lithuania prepares for conflict
Can Ukraine restore its pre-war borders? Why are Tomahawk missiles for Ukraine a'red line' for Russia? Is Russia testing NATO with aerial incursions in Europe? Lithuania, a small Baltic state bordering Belarus and Russia's Kaliningrad, is adapting to new tensions between NATO and Moscow. A member of the Lithuanian Riflemen's Union takes part in a military exercise in central Lithuania [Nils Adler/Al Jazeera] Two members of the Lithuanian Riflemen's Union take part in a military exercise in central Lithuania [Nils Adler/Al Jazeera] On a nearby building is an illuminated decorative Z, a symbol used to show support for the Russian military's full-scale invasion of Ukraine, which began in February 2022.
Machine-learning for photoplethysmography analysis: Benchmarking feature, image, and signal-based approaches
Moulaeifard, Mohammad, Coquelin, Loic, Rinkeviฤius, Mantas, Soloลกenko, Andrius, Pfeffer, Oskar, Bench, Ciaran, Hegemann, Nando, Vardanega, Sara, Nandi, Manasi, Alastruey, Jordi, Heiss, Christian, Marozas, Vaidotas, Thompson, Andrew, Aston, Philip J., Charlton, Peter H., Strodthoff, Nils
Photoplethysmography (PPG) is a widely used non-invasive physiological sensing technique, suitable for various clinical applications. Such clinical applications are increasingly supported by machine learning methods, raising the question of the most appropriate input representation and model choice. Comprehensive comparisons, in particular across different input representations, are scarce. We address this gap in the research landscape by a comprehensive benchmarking study covering three kinds of input representations, interpretable features, image representations and raw waveforms, across prototypical regression and classification use cases: blood pressure and atrial fibrillation prediction. In both cases, the best results are achieved by deep neural networks operating on raw time series as input representations. Within this model class, best results are achieved by modern convolutional neural networks (CNNs). but depending on the task setup, shallow CNNs are often also very competitive. We envision that these results will be insightful for researchers to guide their choice on machine learning tasks for PPG data, even beyond the use cases presented in this work.
A Mobile Robotic Approach to Autonomous Surface Scanning in Legal Medicine
Grube, Sarah, Latus, Sarah, Fischer, Martin, Raudonis, Vidas, Heinemann, Axel, Ondruschka, Benjamin, Schlaefer, Alexander
Purpose: Comprehensive legal medicine documentation includes both an internal but also an external examination of the corpse. Typically, this documentation is conducted manually during conventional autopsy. A systematic digital documentation would be desirable, especially for the external examination of wounds, which is becoming more relevant for legal medicine analysis. For this purpose, RGB surface scanning has been introduced. While a manual full surface scan using a handheld camera is timeconsuming and operator dependent, floor or ceiling mounted robotic systems require substantial space and a dedicated room. Hence, we consider whether a mobile robotic system can be used for external documentation. Methods: We develop a mobile robotic system that enables full-body RGB-D surface scanning. Our work includes a detailed configuration space analysis to identify the environmental parameters that need to be considered to successfully perform a surface scan. We validate our findings through an experimental study in the lab and demonstrate the system's application in a legal medicine environment. Results: Our configuration space analysis shows that a good trade-off between coverage and time is reached with three robot base positions, leading to a coverage of 94.96 %. Experiments validate the effectiveness of the system in accurately capturing body surface geometry with an average surface coverage of 96.90 +- 3.16 % and 92.45 +- 1.43 % for a body phantom and actual corpses, respectively. Conclusion: This work demonstrates the potential of a mobile robotic system to automate RGB-D surface scanning in legal medicine, complementing the use of post-mortem CT scans for inner documentation. Our results indicate that the proposed system can contribute to more efficient and autonomous legal medicine documentation, reducing the need for manual intervention.
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey
Ni, Bo, Liu, Zheyuan, Wang, Leyao, Lei, Yongjia, Zhao, Yuying, Cheng, Xueqi, Zeng, Qingkai, Dong, Luna, Xia, Yinglong, Kenthapadi, Krishnaram, Rossi, Ryan, Dernoncourt, Franck, Tanjim, Md Mehrab, Ahmed, Nesreen, Liu, Xiaorui, Fan, Wenqi, Blasch, Erik, Wang, Yu, Jiang, Meng, Derr, Tyler
Retrieval-Augmented Generation (RAG) is an advanced technique designed to address the challenges of Artificial Intelligence-Generated Content (AIGC). By integrating context retrieval into content generation, RAG provides reliable and up-to-date external knowledge, reduces hallucinations, and ensures relevant context across a wide range of tasks. However, despite RAG's success and potential, recent studies have shown that the RAG paradigm also introduces new risks, including robustness issues, privacy concerns, adversarial attacks, and accountability issues. Addressing these risks is critical for future applications of RAG systems, as they directly impact their trustworthiness. Although various methods have been developed to improve the trustworthiness of RAG methods, there is a lack of a unified perspective and framework for research in this topic. Thus, in this paper, we aim to address this gap by providing a comprehensive roadmap for developing trustworthy RAG systems. We place our discussion around five key perspectives: reliability, privacy, safety, fairness, explainability, and accountability. For each perspective, we present a general framework and taxonomy, offering a structured approach to understanding the current challenges, evaluating existing solutions, and identifying promising future research directions. To encourage broader adoption and innovation, we also highlight the downstream applications where trustworthy RAG systems have a significant impact.
PCAP-Backdoor: Backdoor Poisoning Generator for Network Traffic in CPS/IoT Environments
Chathoth, Ajesh Koyatan, Lee, Stephen
The rapid expansion of connected devices has made them prime targets for cyberattacks. To address these threats, deep learning-based, data-driven intrusion detection systems (IDS) have emerged as powerful tools for detecting and mitigating such attacks. These IDSs analyze network traffic to identify unusual patterns and anomalies that may indicate potential security breaches. However, prior research has shown that deep learning models are vulnerable to backdoor attacks, where attackers inject triggers into the model to manipulate its behavior and cause misclassifications of network traffic. In this paper, we explore the susceptibility of deep learning-based IDS systems to backdoor attacks in the context of network traffic analysis. We introduce \texttt{PCAP-Backdoor}, a novel technique that facilitates backdoor poisoning attacks on PCAP datasets. Our experiments on real-world Cyber-Physical Systems (CPS) and Internet of Things (IoT) network traffic datasets demonstrate that attackers can effectively backdoor a model by poisoning as little as 1\% or less of the entire training dataset. Moreover, we show that an attacker can introduce a trigger into benign traffic during model training yet cause the backdoored model to misclassify malicious traffic when the trigger is present. Finally, we highlight the difficulty of detecting this trigger-based backdoor, even when using existing backdoor defense techniques.
Automated Classification of Cybercrime Complaints using Transformer-based Language Models for Hinglish Texts
Rani, Nanda, Singh, Divyanshu, Saha, Bikash, Shukla, Sandeep Kumar
The rise in cybercrime and the complexity of multilingual and code-mixed complaints present significant challenges for law enforcement and cybersecurity agencies. These organizations need automated, scalable methods to identify crime types, enabling efficient processing and prioritization of large complaint volumes. Manual triaging is inefficient, and traditional machine learning methods fail to capture the semantic and contextual nuances of textual cybercrime complaints. Moreover, the lack of publicly available datasets and privacy concerns hinder the research to present robust solutions. To address these challenges, we propose a framework for automated cybercrime complaint classification. The framework leverages Hinglish-adapted transformers, such as HingBERT and HingRoBERTa, to handle code-mixed inputs effectively. We employ the real-world dataset provided by Indian Cybercrime Coordination Centre (I4C) during CyberGuard AI Hackathon 2024. We employ GenAI open source model-based data augmentation method to address class imbalance. We also employ privacy-aware preprocessing to ensure compliance with ethical standards while maintaining data integrity. Our solution achieves significant performance improvements, with HingRoBERTa attaining an accuracy of 74.41% and an F1-score of 71.49%. We also develop ready-to-use tool by integrating Django REST backend with a modern frontend. The developed tool is scalable and ready for real-world deployment in platforms like the National Cyber Crime Reporting Portal. This work bridges critical gaps in cybercrime complaint management, offering a scalable, privacy-conscious, and adaptable solution for modern cybersecurity challenges.
GrEmLIn: A Repository of Green Baseline Embeddings for 87 Low-Resource Languages Injected with Multilingual Graph Knowledge
Gurgurov, Daniil, Kumar, Rishu, Ostermann, Simon
Contextualized embeddings based on large language models (LLMs) are available for various languages, but their coverage is often limited for lower resourced languages. Using LLMs for such languages is often difficult due to a high computational cost; not only during training, but also during inference. Static word embeddings are much more resource-efficient ("green"), and thus still provide value, particularly for very low-resource languages. There is, however, a notable lack of comprehensive repositories with such embeddings for diverse languages. To address this gap, we present GrEmLIn, a centralized repository of green, static baseline embeddings for 87 mid- and low-resource languages. We compute GrEmLIn embeddings with a novel method that enhances GloVe embeddings by integrating multilingual graph knowledge, which makes our static embeddings competitive with LLM representations, while being parameter-free at inference time. Our experiments demonstrate that GrEmLIn embeddings outperform state-of-the-art contextualized embeddings from E5 on the task of lexical similarity. They remain competitive in extrinsic evaluation tasks like sentiment analysis and natural language inference, with average performance gaps of just 5-10\% or less compared to state-of-the-art models, given a sufficient vocabulary overlap with the target task, and underperform only on topic classification. Our code and embeddings are publicly available at https://huggingface.co/DFKI.
Annotations for Exploring Food Tweets From Multiple Aspects
Rikters, Matฤซss, Marrese-Taylor, Edison, Vฤซksna, Rinalds
This research builds upon the Latvian Twitter Eater Corpus (LTEC), which is focused on the narrow domain of tweets related to food, drinks, eating and drinking. LTEC has been collected for more than 12 years and reaching almost 3 million tweets with the basic information as well as extended automatically and manually annotated metadata. In this paper we supplement the LTEC with manually annotated subsets of evaluation data for machine translation, named entity recognition, timeline-balanced sentiment analysis, and text-image relation classification. We experiment with each of the data sets using baseline models and highlight future challenges for various modelling approaches.
The Helicobacter pylori AI-Clinician: Harnessing Artificial Intelligence to Personalize H. pylori Treatment Recommendations
Higgins, Kyle, Nyssen, Olga P., Southern, Joshua, Laponogov, Ivan, CONSORTIUM, AIDA, Veselkov, Dennis, Gisbert, Javier P., Kanonnikoff, Tania Fleitas, Veselkov, Kirill
Infecting roughly 1 in 2 individuals globally, it is the leading cause of peptic ulcer disease, chronic gastritis, and gastric cancer. To investigate whether personalized treatments would be optimal for patients suffering from infection, we developed the H. pylori AI-clinician recommendation system. This system was trained on data from tens of thousands of H. pylori-infected patients from Hp-EuReg, orders of magnitude greater than those experienced by a single real-world clinician. We first used a simulated dataset and demonstrated the ability of our AI Clinician method to identify patient subgroups that would benefit from differential optimal treatments. Next, we trained the AI Clinician on Hp-EuReg, demonstrating on average the AI Clinician reproduces known quality estimates of treatment decision making, for example bismuth and quadruple therapies out-performing triple, with longer durations and higher dose proton pump inhibitor (PPI) showing higher quality estimation on average. Next, we demonstrated that treatment was optimized by recommended personalized therapies in patient subsets, where 65% of patients were recommended a bismuth therapy of either metronidazole, tetracycline, and bismuth salts with PPI, or bismuth quadruple therapy with clarithromycin, amoxicillin, and bismuth salts with PPI, and 15% of patients recommended a quadruple non-bismuth therapy of clarithromycin, amoxicillin, and metronidazole with PPI. Finally, we determined trends in patient variables driving the personalized recommendations using random forest modelling. With around half of the world likely to experience H. pylori infection at some point in their lives, the identification of personalized optimal treatments will be crucial in both gastric cancer prevention and quality of life improvements for countless individuals worldwide.
Deep Learning for Fetal Inflammatory Response Diagnosis in the Umbilical Cord
Ayad, Marina A., Nateghi, Ramin, Sharma, Abhishek, Chillrud, Lawrence, Seesillapachai, Tilly, Cooper, Lee A. D., Goldstein, Jeffery A.
Inflammation of the umbilical cord can be seen as a result of ascending intrauterine infection or other inflammatory stimuli. Acute fetal inflammatory response (FIR) is characterized by infiltration of the umbilical cord by fetal neutrophils, and can be associated with neonatal sepsis or fetal inflammatory response syndrome. Recent advances in deep learning in digital pathology have demonstrated favorable performance across a wide range of clinical tasks, such as diagnosis and prognosis. In this study we classified FIR from whole slide images (WSI). We digitized 4100 histological slides of umbilical cord stained with hematoxylin and eosin(H&E) and extracted placental diagnoses from the electronic health record. We build models using attention-based whole slide learning models. We compared strategies between features extracted by a model (ConvNeXtXLarge) pretrained on non-medical images (ImageNet), and one pretrained using histopathology images (UNI). We trained multiple iterations of each model and combined them into an ensemble. The predictions from the ensemble of models trained using UNI achieved an overall balanced accuracy of 0.836 on the test dataset. In comparison, the ensembled predictions using ConvNeXtXLarge had a lower balanced accuracy of 0.7209. Heatmaps generated from top accuracy model appropriately highlighted arteritis in cases of FIR 2. In FIR 1, the highest performing model assigned high attention to areas of activated-appearing stroma in Wharton's Jelly. However, other high-performing models assigned attention to umbilical vessels. We developed models for diagnosis of FIR from placental histology images, helping reduce interobserver variability among pathologists. Future work may examine the utility of these models for identifying infants at risk of systemic inflammatory response or early onset neonatal sepsis.